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Abstract 

The popularity of Tor as an anonymity system has made 
it a popular target for a variety of attacks. We focus on traffic 
correlation attacks, which are no longer solely in the realm of 
academic research with recent revelations about the NS A and 
GCHQ actively working to implement them in practice. 

Our first contribution is an empirical study that allows 
us to gain a high fidelity snapshot of the threat of traffic 
correlation attacks in the wild. We find that up to 40% of 
all circuits created by Tor are vulnerable to attacks by traffic 
correlation from Autonomous System (AS)-level adversaries, 
42% from colluding AS-level adversaries, and 85% from state- 
level adversaries. In addition, we find that in some regions 
(notably, China and Iran) there exist many cases where over 
95% of all possible circuits are vulnerable to correlation 
attacks, emphasizing the need for AS-aware relay-selection. 

To mitigate the threat of such attacks, we build Astoria-an 
AS-aware Tor client. Astoria leverages recent developments in 
network measurement to perform path-prediction and intelli¬ 
gent relay selection. Astoria reduces the number of vulnerable 
circuits to 2% against AS-level adversaries, under 5% against 
colluding AS-level adversaries, and 25% against state-level 
adversaries. In addition, Astoria load balances across the Tor 
network so as to not overload any set of relays. 

I. Introduction 

Tor is a popular anonymity system for users who wish 
to access the Internet anonymously or circumvent censor¬ 
ship HD. The increasing popularity of Tor has recently made 
it a high-value target for blocking and denial of service (H), 
p9| , l43| and traffic correlation attacks to deanonymize 
users P4f , | |^ , (SOj , | |3T1 , | [37l . Traffic correlation attacks, 
which correlate traffic entering the Tor network with traffic 
exiting it, are no longer solely in the realm of academic 
research with recent revelations about the NS A and GCHQ 
actively working to implement them in practice, in collusion 
with Internet Service Providers (ISPs) 0,0,0 

Traffic correlation attacks have been shown to be feasible 
and practical for network-level attackers. Specifically, a traffic 
correlation attack may be implemented by any autonomous 
system (AS) that lies on both the path from the Tor client 
to the entry relay and on the path from the exit relay to the 
destination. Previous studies have demonstrated the potential 
for this type of attack |T^ , p5| . Proposed defenses 

include relay selection strategies to avoid ASes that are in 


a position to launch them However, recent work ED has 
shown that these strategies perform poorly in practice. 

The threat of network-level adversaries has been exacer¬ 
bated by a recent study which highlights that the set of ASes 
that are in a position to perform traffic correlation analysis 
is potentially much larger due to asymmetric routing, routing 
instabilities, and intentional manipulations of the Internet’s 
routing system j^, | [4Q| . These attacks significantly raise 
the bar for relay-selection systems. Specifically, they require 
the relay-selection system be able to accurately measure or 
predict network paths in both the forward and reverse direction. 
Measuring the reverse path between two Internet hosts is non¬ 
trivial, especially when the client does not have control over 
the destination, as is commonly the case for popular Web 
services. While solutions for measuring reverse paths have 
been proposed | [27| , they are still not widely deployed or 
available. 

In this paper, we make contributions in two dimensions. 
Eirst, we quantify the threat posed by these new attacks. 
Second, we develop a relay selection method to minimize their 
impact. 

Measuring the threat faced by Tor. We leverage up-to- 
date maps of the Internet’s topology combined with 
algorithmic simulations to predict which ASes are in 
a position to perform traffic correlation analysis on forward 
or reverse paths. We validate this technique and show that it 
provides a reasonable estimate on the threat faced from AS- 
level attackers. We then augment our analysis with techniques 
to identify ASes owned by a single organization (sibling ASes) 
in order to gain a clearer picture of which ASes are likely to 
collude with each other. This provides a more complete picture 
of network-level threats than previous work. In addition, we 
consider the threat from state-level attackers that have insight 
into traffic transiting through all regional ASes. Through these 
techniques and our experiments, we make the following key 
observations: 

• Up to 40% of circuits constructed by the current Tor 
client are vulnerable to network-level attackers. 

• Up to 37% of all sites in our study, when loaded from 
Brazil, China, Germany, Spain, Erance, England, Iran, 
Italy, Russia, and the United States had main page 
requests that were reached via a vulnerable path (i.e., a 
path that contained network-level entities in a position 
to launch traffic correlation attacks), when loaded by 
the vanilla Tor client. 


• Connections from China were found to be most vul¬ 
nerable to network-level attackers with up to 86% of 
all Tor circuits and 56% of all main page requests 
to sites in the study being vulnerable to colluding 
network-level attackers. 

• For up to 8% of the requests generated from China and 
Iran, over 95% of all possible Tor constructed circuits 
were vulnerable to correlation attacks by network- 
level attackers. 

• Reducing the number of entry guards can result in an 
increase in vulnerability of Tor circuits. In particular, 
we found that using a single guard significantly in¬ 
creases the threat from traffic correlation attacks, while 
the difference between using two and three guards is 
marginal. 

• State-level attackers are in a position to launch cor¬ 
relation attacks on up to 85% of all Tor constructed 
circuits. 


Mitigating the threat of AS-level adversaries. We propose, 
construct, and evaluate Astoria- an AS-aware Tor client that 
includes security and relay bandwidth considerations when 
creating Tor circuits. Astoria is the first AS-aware Tor client 
to consider the recently proposed asymmetric correlation at¬ 
tacks | |4Q1 . When there are safe alternatives, Astoria 

actively avoids using circuits on which asymmetric correlation 
attacks might be launched. It also leverages methods for 
identifying sibling ASes |[T^ when determining whether or not 
a given circuit is safe. In the absence of a safe path, Astoria 
uses a linear program to minimize the threat posed by any 
adversary. Finally, Astoria considers the bandwidth capabilities 
of relays while making AS-aware relay selection decisions. 
When there are multiple safe relay selections, Astoria aims 
to be a good network citizen and distributes load across Tor 
relays in the same manner as the vanilla Tor client. Therefore, 
in spite of selecting safer relays, Astoria will not overload any 
single set of relays. 


Paper outline. In Section |n| we briefly overview how the 
current Tor client performs relay selection and circuit construc¬ 
tion, describe the current state of research in relay selection 


for Tor, and introduce our adversary model. In Section III 


we describe the components of our measurement toolkit used 
for detecting network-level attackers on Tor circuits. We then 
present some interesting results regarding the vulnerability of 
Tor constructed circuits and the general potential for attack by 
single AS-, sibling AS-, and state-level attackers. In Section 


IV we present the details of our AS-aware client - Astoria. A 


performance and security evaluation of Astoria is performed in 
Section |V| In Section VI we discuss the known shortcomings 
of Astoria and motivate directions for future research on AS- 
aware clients. We make our conclusions in Section W1 


II. Background and Motivation 

We now provide background on Tor relay selection, related 
work in this area, and our adversary model. 

A. Tor relay selection 

The Tor anonymity network consists of approximately 
6,000 relays (Tor routers). Most requests made through a Tor 


client are sent to their destination via a three-hop path known 
as a circuit. Each circuit consists of an entry, middle, and exit 
relay. The entry-relay communicates directly with the client 
and the exit-relay communicates with the destination. The 
fundamental idea is that no single relay in the circuit learns 
the source and destination. 

In its early days. Tor selected relays for each circuit hop 
uniformly at random from the set of available relays. This was 
changed in order to improve performance (by preferring to 
route through higher bandwidth relays j^) and security (TT). 
In today’s Tor network, based on certain performance char¬ 
acteristics such as reliability, bandwidth served, and up-time, 
relays may earn certain flags that make them a preferential 
choice for various roles during circuit construction. 

One such flag is the guard fiag. New relays joining the 
Tor network are monitored for stability and performance via 
remote measurements for a period of up to eight days 0. At 
this point, relays that have demonstrated stability and reliability 
are assigned a guard fiag. Relays with a guard fiag earn the 
ability to serve as the entry-relay to the Tor network. By default 
the Tor client selects three guards to be used as entry-relays 
for all circuits for a prolonged period of time. The main ideas 
behind the selection of a fixed set of entry-relays are (1) to 
reduce the possibility that a client will select an entry- and 
exit-relay operated by the same entity (after prolonged use), 
(2) prevent attacker-owned entry-relays from denying service 
to clients that are not also using an exit-relay owned by the 
attacker, and (3) increase the cost to an attacker that wishes 
to be chosen as an entry-relay, by requiring them to earn the 
guard fiag 0- 

In addition to picking relays that are more stable and 
reliable, for other locations on a circuit, the Tor client also 
requires that (1) no two routers on a circuit share the same /16 
subnet and (2) no routers in the same family (as advertised by 
the router) may be chosen on the same circuit. 0. 

B. Related work 

The threat of correlation attacks by AS-level adversaries on 
the Tor network was first identified and empirically evaluated 
by Feamster and Dingledine l [T^ in 2004, when the Tor 
network had only 33 relays and significantly different relay 
selection algorithms. The study revealed that 10-30% of all 
circuits constructed by Tor had a common AS that could 
observe both ends of the circuit. Shortly after, by constructing 
efficient traffic correlation attacks while considering network- 
level adversaries, Murdoch and Danezis and Murdoch 
and Zielinski (D demonstrated that the threat from AS-level 
attackers was one of practical concern. In 2009, Edman and 
Syverson (Tg found that the threat of AS-level adversaries 
had not reduced since fT^ , in spite of revised relay selection 
strategies and substantially larger number of relays in the 
network. 

In addition, Edman and Syverson fT^ were the first to 
consider threats from network-level attackers due to the asym¬ 
metric nature of Internet routing. Using the 2009 topology of 
the Internet, AS paths inferred by Qiu’s algorithm p2| , and 
AS relationships inferred by Gao’s algorithm l [20| they found 
that in their experiments up to 39% of all Tor circuits were 
vulnerable to network-level adversaries that performed attacks 
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Forward path " ^ Reverse path 


Fig. 1: Standard and reverse-path traffic correlation attacks. In the 
standard traffic correlation attack, AS2 must observe the direction of 
the connection that data is fiowing on (forward path). In the reverse- 
path traffic correlation attack AS2 can infer the data flow using ACK 
numbers on the reverse path. 


on forward- and reverse-paths. Most recently, Vanbever et al. 
| [40| and Sun et al p9| , presented RAPTOR, an AS-level 
attack integrating BGP interception with the first correlation 
attack that takes advantage of the asymmetric nature of Internet 
routing, to exactly de-anonymize Tor users with up to 90% 
accuracy in just 300 seconds. Similarly, Johnson et al p5| per¬ 
formed an empirical evaluation of the effect of network-level 
adversary bandwidth investment strategies, Tor client location, 
and Tor client use {e.g., for IRC, browsing, BitTorrent, etc.). 
They found that a network-level adversary could effectively 
de-anonymize most Tor users within six months with very 
low bandwidth costs. These works emphasize the need for Tor 
relay selection strategies to consider ASes that lie both, on 
the forward- and reverse-paths between the (client, entry) and 
(exit, destination). 

Perhaps most closely related to our work, in terms of 
end-goals and evaluation methodology, Akhoondi et al j^, 
constructed LASTor, a Tor client which explicitly considered 
AS-level attackers and relay locations while constructing Tor 
circuits. While LASTor appeared to successfully reduce path 
latencies and the probability of common ASes at either end of 
the Tor circuits, it neglected the capacity of relays selected by 
the system. Relay capacity is an important variable to consider 
to ensure that custom relay selection schemes do not overload 
a small set of relays, therefore reducing the performance of 
the entire network. Their evaluation, based on only HTTP 
HEAD requests (as opposed to complete webpage loads), did 
not stress the system sufficiently to reveal the issues associated 
with capacity-agnostic relay selection. Further, LASTor does 
not consider an adversary that may (I) collude with other ASes 
or operate at the state-level, and/or ( 2 ) only need to be on one 
of the asymmetric path segments between source and entry- 
relay; and exit-relay and destination {e.g., RAPTOR). 

C. Adversary model 

In the standard view of traffic correlation attacks, an AS 
needs to lie on the forward patlf] between the source and 
destination (i.e., on the solid green colored path segments in 
Figure (a)). With this point the adversary (AS 2) can view 
the packet sizes and timings as transmitted from the source to 
destination, going-into and coming-out-of the Tor network and 
directly perform a traffic correlation attack. 

^Here we use ‘forward path’ to refer to the direction of data flow in the 
TCP connection 


However, recent work by Vanbever et al. | [40| and Sun et 
al [3^ highlights the fact that an adversary on the reverse 
path may also learn packet size and timing information via 
the TCP Acknowledgement (ACK) field. Figure [^b) illustrates 
this case. AS 2 can directly observe packet timings between the 
source and entry-relay AS (Entry AS), but can only observe 
ACKs from the destination back to the exit-relay AS (Exit 
AS). 

In this view, an adversary has the potential to launch 
a traffic correlation attack on a Tor circuit as long as the 
following criteria are satisfied: 

Let Psrc^entry — ^^ 2 ? • • • 5 t)e the Set of 

ASes on the path from the source (Tor client) to the selected 
entry-relay (this set includes the entry-relay AS), Pentry^src = 
{AS[,AS 2 , ..., be the set of ASes on the path from the 

entry-relay back to the source, and Pentry^src = Pentry^src^ 
Psrc^entry We similarly define paths to and from the exit- 
relay and destination (e.g., a popular content provider, or other 
Web service) as Pexit^dstf Pdst^exitf flud Pexit^dst- 

We say that a Tor circuit is vulnerable to a traffic correlation 
attack if there exists an AS Ai such that: 


Ai G {jPsrc^entry ^Pexit-^dst\ (1) 

Similar to prior work on relay selection, we assume that our 
adversary is an autonomous system (AS), or an entity working 
with the cooperation of ASes {e.g., governments). However, 
while ah previous work only considers the standard view of 
network attacks, we also consider attackers that may he on the 
reverse-path, as described above. In addition, we also include 
the possibility that some sets of ASes may collude with each 
other to de-anonymize Tor users. Specifically, we consider that 
an AS may collude with sibling ASes fT0| (i.e., other ASes 
owned by the same organization) and ASes that may collude 
with each other on behalf of a state-level adversary. Finally, 
as part of our relay selection algorithms (Section [Iv]), we 
consider a probabilistic relay selection strategy that minimizes 
the amount of traffic that is observable by any single attacker 
over a period of time. 

III. Measuring Adversary Presence 

In this section, we investigate the prevalence of the adver¬ 
sary described in Section First, we detail how prediction 
of AS paths between a source and a destination is performed 
and how sets of potential attacking ASes are generated. Then 
we present the experimental methodology used to make these 
measurements. Finally, we present the results of these experi¬ 
ments. 

A. Predicting potential attacker ASes 

Adversaries that can exploit asymmetric routing present a 
challenge to measuring their prevalence. The addition of poten¬ 
tial attackers on the reverse-path between a source and desti¬ 
nation implies the need for identifying potential attackers {le., 
ASes) on the reverse-paths between the client and entry-relay 
(and the exit-relay and destination). This poses a challenging 
measurement problem, since reliably measuring information 
about reverse-paths is currently not possible. While Reverse 
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Traceroute | |?7l would be a useful tool for these measurements, 
it is currently not widely deployed. 

Additionally, since our measurement toolkit was assembled 
with the goal of integration with our Tor client - Astoria 
(Section [IV| ), using external measurement and control-plane 
mapping tools was not an option. This is because such tools 
require knowledge of the clients’ intended destination - an 
undesirable option for an anonymity tool such as Tor. Thus, 
any measurement or path prediction needs to be performed on 
the Tor client without leaking any information to attackers or 
third party tools and service providers. 

To address the challenges of reliably measuring reverse- 
paths or use control-plane mapping tools, we employ an 
efficient path prediction approach which leverages up-to-date 
maps of the AS-level Internet topology 1^ , and algorithmic 
simulations that take into account a common model of routing 
policies 

AS-level topology. We perform path prediction using an 
empirically-derived AS-level Internet topology. In this abstrac¬ 
tion, the Internet is represented as a graph with ASes as 
nodes and edges as connections between them. Connections 
between ASes are negotiated as business arrangements and 
are often modeled as two main types of relationship: customer- 
provider where the customer pays the provider for data sent 
and received; and settlement-free peering or peer-peer where 
two ASes agree to transit traffic at no cost 0 

However, in practice AS relationships may violate this 
simple taxonomy e.g., ASes that agree to provide transit for 
a subset of prefixes {partial transit) or ASes that have dif¬ 
ferent economic arrangements in different geographic regions 
{hybrid relationships) It can also be the case that two 
ASes are controlled by the same organization e.g., because 
of corporate mergers such as Level 3 (AS3356) and Global 
Crossing (AS3549) or organizations that leverage different AS 
numbers in different regions such as Verizon (AS701, 702, 
703). Additionally, integrating IXPs is a complicated research 
subject due to a dearth of measurement data to inform how 
they should be incorporated - e.g., just because two ISPs peer 
at an IXP does not mean all paths including these ISPs will 
traverse the IXP. The AS-level topology we leverage takes 
partial transit and hybrid relationships into account, but ignores 
IXPs (which would result in a significant over-estimation of 
our measurements, due to their peering meshes). We use 
techniques discussed and validated by Anwar et al. |TQ| for 
detecting sibling ASes. This is done to identify ASes that are 
likely to collude with each other. 

Routing policies. Routing on the AS-graph deviates from 
simple shortest path routing because ASes route their traffic 
based on economic considerations. We use a standard model 
of routing policies proposed by Gao and Rexford pl] |. The 
path selection process can be broken down into the following 
ordered steps: 

• Local Preference (LP). Paths are ranked based on 
their next hop: customer is chosen over peer which 
is chosen over provider. 

• Shortest Paths (SP). Among the paths with the highest 
local preference, prefer the shortest ones. 


Entry ASes Exit ASes 



Fig. 2: Illustration of the AS paths that the client needs to predict, 
note that these paths must be predicted for each potential entry and 
exit relay in both the forward and reverse direction. 


• Tie Break (TB). If there are multiple such paths, node 
a breaks ties: if b is the next hop on the path, choose 
the path where hash, i7(a, b) is the lowest]^ 

This standard model of local preference (ID captures the 
idea that an AS has incentives to prefer routing through a 
customer (that pays it) over a peer (no money is exchanged) 
over a provider (that it must pay). 

In addition to selecting paths, ASes must determine which 
paths they will announce to other ASes based on export 
policies. The standard model of export policies captures the 
idea that an AS will only load its network with transit traffic 
if its customer pays it to do so pT| : 

• Export Policy (EP). AS b announces a path via AS c 
to AS a iff at least one of a and c are its customers. 

Computing paths following these policies using simulation 
platforms {e.g., CBGP p^ ) can be computationally expensive 
which limits the scale of analysis. Thus, we employ an 
algorithmic approach that allows us to compute all paths 
to a given destination in 0{\V\-\-\E\) where |V| is the number 
of ASes and |L^| is the number of edges. 

Predicting paths. We use the routing policies and algorithmic 
simulations p2| as described above to compute routes between 
pairs of ASes using the AS-level topology published by 
CAIDA AS-level path prediction between a source and 
destination is a thorny issue, for example the recent work from 
Juen, et al. p6| shows that the paths predicted by BGP-based 
path prediction vary significantly from traceroute-based path 
prediction. However, our BGP-based path prediction toolkit 
makes use of the state-of-the-art in path inference and AS- 
relationship inference that have both been extensively validated 
with empirical measurements by Anwar et al. p0| and Giotsas 
et al. p^ . 

In particular, Anwar, et al. |T0| show that 65-85% of 
measured paths are in the set of paths which satisfy LP and SP. 
Thus, we modify the algorithmic simulator to return all paths 
satisfying LP and SP simultaneously, instead of using TB to 
produce a unique path. Thus we consider the set of ASes in 
the set of paths satisfying LP and SP between a and b to be 
the set pa^b- 

Identifying vulnerable circuits. Let plrc^entry the 

LP and SP satisfying (forward- or reverse-) path between the 

^In practice, this is done using the distance between routers and router 
IDs. Since we do not incorporate this information in our model we use a 
randomized tie break which prevents certain ASes from “always winning”. 
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Fig. 3: Fraction of actually vulnerable paths from all possible paths, 
for each of 20,000 circuits marked as vulnerable by our toolkit. 


source and entry-relay, pixu^dst between 

the exit and destination, Varc^entry = ^i{p\rc^entry} > and 

Vexit^dst = '->j{pixit^dst}- We refer to Va^b as the path-set 
between a and b. 

Since it is currently not possible to predict exactly which 
path from V = Vsrc^entry X Vexit^dst will be utilized when 
using a circuit with entry-relay entry and exit-relay exit, 
we label all paths p ^ V sls vulnerable ijf at-least one of 
the paths in V is vulnerable (as defined in Eq. [T]). That is, 
once our path prediction toolkit returns the set of ASes that 
occupy each path-set between the Tor client and a given entry- 
relay {Vsrc^entry) ^nd between the exit-relay and destination 
{Vexit^dst), potential circuits using the corresponding entry- 
and exit-relay are labeled as vulnerable ijf there are common 
or sibling ASes on the (client, entry-relay) and (exit-relay, 
destination) path-set - i.e., {Vsrc^entry H Vexit^dst} 7^ 0- 
This provides an estimate on the threat posed by network-level 
attackers. 

To understand the tightness of this estimate, we analyzed 
the fraction of the actually vulnerable paths in each of 20,000 
unique “vulnerable” circuits generated by our experiments. 
Figure shows the result of this analysis. 25% of all circuits 
had all their paths in V vulnerable to at-least one network- 
level attacker and 56% of all circuits had at-least 50% of their 
paths (in V) vulnerable to at-least one network-level attacker. 

B. Measurement methodology and results 

To understand the threat posed by the adversary described 
in Section |I^ we performed several experiments. In particular, 
our goal was to understand the threat faced by the Tor client 
under various configurations, and in different network and 
geographic locations. 

Experimental setup. In our experiments, we consider the 
fact that Tor users in different countries face different levels 
of threats from local ASes. To this end, each experiment 
was performed in 10 different countries: Brazil (BR), China 
(CN), Germany (DE), Spain (ES), France (FR), England (GB), 
Iran (IR), Italy (IT), Russia (RU), and the United States 
(US). This list was obtained by considering the intersections 
of the number of Tor users in each country |[42| and the 
Freedom House rankings for Internet freedom | [19| . In order 
to completely understand the threats faced by Tor users, five 
experiments were conducted in each country; a summary of 
each experiment is shown in Table |I] 


Vulnerable 

Vanilla Tor 

Uniform Tor 

Websites (Main request) 

37% 

35% 

Websites (Any request) 

53% 

69% 

Circuits (All requests) 

40% 

39% 


TABLE II: Summary of threat from asymmetric correlation 
attacks against the vanilla Tor and uniform relay-selection strate¬ 
gies for 200 websites in 10 countries. 


For each experiment, 200 websites were loaded using the 
Selenium Firefox webdriver The list of 200 websites 
comprised of the local Alexa Top 100 sites |[Tj and 100 
sensitive (i.e., likely to be blocked) pages obtained from the 
Citizen Lab testing list repository Q for each country. 

Each experiment was conducted in one of two settings: 
Live or Simulation. In the Live setting, the actual client (vanilla 
Tor or Astoria) being studied was used to load pages from 
within the respective country using a single VPN as the vantage 
point. The VPN vantage point only presents a limited picture 
of the threat faced by all users in the country (since it only 
considers a single AS as the client location (source AS)), thus 
we used simulations to augment the Live experiments. Each 
simulation considered clients located in 100 randomly selected 
ASes in each country. 

For each experiment, logs were maintained to track: (1) the 
list of available entry- and exit-relays during circuit construc¬ 
tion, (2) the actual chosen entry and exit-relay for each circuit 
constructed by the client, and (3) the list of requests made for 
each site and the circuit used by the Tor client to serve the 
request. Data from these logs were fed to our measurement 
toolkit in order to identify (1) the set of attackers that threaten 
actually constructed circuits (Live experiments) and (2) the 
set of attackers that threaten potential circuits - i.e., circuits 
that could have been constructed given a particular valid 
combination of available entry- and exit-relays (Simulation 
experiments). 

El: Measuring vulnerability to network-level attacks. This 
experiment was conducted using the vanilla Tor client and a 
modified Tor client using a uniform relay-selection strategy. 
Both clients used the same VPN in each of the 10 countries 
to load their corresponding Alexa top 100 and 100 sensitive 
pages. Three statistics were measured: (1) The number of 
websites which had the circuits carrying the request for their 
main page being vulnerable, (2) the number of websites which 
had any of their circuits being vulnerable, and (3) the total 
number of vulnerable circuits. 

A summary of these results are illustrated in Table |I^ We 
see that both clients have similar number of compromisable 
circuits, however the vanilla Tor client allows 16% more web¬ 
sites to load without having any of their circuits compromised, 
implying that when a website is loaded with the vanilla Tor 
client it is either completely safe or has most of its content 
loaded via a vulnerable circuit. This is due to the fact that 
unlike the modified Tor client, the vanilla Tor client reuses a 
small number of circuits for many requests. 

We break down our results for the vanilla Tor client by 
country in Figure The figure shows the percentage of 
websites that are vulnerable to asymmetric correlation attacks 
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ID 

Question Answered 

Vantage Point 

Setting 

Results 

El 

How vulnerable are circuits to asymmetric correlation attacks? 

VPN 

Live (3 guards) 

Eigures 

14a|and| 

14b 

E2 

How many attacker-free paths are available to the vanilla Tor client 

in each country? 

100 ASes per country 

Simulation (all entry- and exit-relays) 

Eigures 

TT 

and ( 


E3 

How much of a threat do colluding sibling ASes pose? 

VPN 

Live (3 guards) 

Eigures pi [ 

i4c 

ana 

4d 


E4 

How much of a threat do state-level attackers pose? 

VPN 

Live (3 guards) 

Eigures |6||14e 

and 

141 


E5 

Do guard settings have a significant effect on the availability of 

attacker-free paths to the vanilla Tor client? 

100 ASes per country 

Simulation (20 guard-sets of 1,2, and 

3 guards and all exit-relays) 

fngure 

9 


TABLE I: Summary of security experiment settings used for the evaluation of the vanilla Tor client and Astoria. For each country, all experiments 
used a dataset containing the local Alexa Top 100 and 100 locally sensitive websites (obtained from the Citizen Lab testing repository 0). 


cu 



Fig. 4: An estimate of the percentage of websites that have main page 
requests and any requests serviced by a vulnerable Tor circuit. 


on circuits built for serving the request for their main page 
(GET) and for serving any request. We find that the threat 
is not uniformly spread. Clients using the vanilla Tor client 
from our VPN vantage point in three countries: China (CN), 
Russia (RU), and the United States (US) were found to be most 
vulnerable. This can be explained by the fact that of our 10 
countries, the US, RU, and CN had the most amount of locally 
hosted content (i.e., content hosted within the country). Of the 
200 sites used for each of the countries, 95% (US), 57% (RU), 
and 47% (CN) made requests to ASes within the country itself 
- making it more likely for the same AS to be on paths from/to 
client to/from entry-relay and exit-relay to/from destination. 


E2: Measuring fraction of available attacker-free paths. 

Since the results of our experiments on the live Tor network 
were highly dependent on the location of the VPN, simulations 
were required to understand the distribution of threat in other 
locations within each country. To this end, for each country, 
100 ASes were randomly selected as client locations and the 
targets of the each of the requests generated by the 200 sites 
(sensitive and popular) for each of our 10 countries were used 
as destinations. The simulation toolkit generated a list of all 
entry- and exit-relays available to each client for performing 
the page load (using Tor client consensus data). 


Each generated (source, entry, exit, destination) combina¬ 
tion was then analyzed for the threat of attackers to understand 
how many “safe” or “attacker-free” entry-exit pairs were 
available. We see in Eigure the cumulative distribution 
function of the fraction of attacker free entry-exit pairs for 
each source-destination pair. Eigure shows this for the five 
most vulnerable countries in our study, and shows this for 
the remaining countries. 


China (CN) and Iran (IR) stand out as the most interesting 
cases. Eirst, we see that 8% of all source-destination pairs have 
less than 10% of their entry-exit options being safe. Next, we 
also notice that there are no known attackers present on 18% 


of all source-destination pairs. This appears to indicate that 
the threat of de-anonymization is non-uniform even within a 
country, with certain client locations being much safer than 
others. 

In order to understand which set of websites are more 
vulnerable in each of the countries, in Eigure we show the 
percentage of source- destination pairs having fewer than 5% 
safe circuit options for each set of websites. We find that in 
all cases, the Alexa top 100 local websites have fewer safe 
circuit options. This can be explained by the fact that locally 
popular websites are likely to be hosted within a regional AS. 
Additionally, we find that China and Iran have a significant 
number of their source-destination pairs having fewer than 5% 
safe circuit options - i.e., over 8% of the source-destination 
pairs have less than 5% of all their circuit options being safe 
from network-level correlation attacks. 

However, in general, the results of El and E2 indicate 
that although in most cases there are many safe entry-exit 
options available to the Tor client, it often does not select these 
options - leading to a large number of vulnerable circuits being 
created. 


Alexa Local 100 Citizen Lab 100 



Fig. 6: (Logscale) Percentage of (source, destination) pairs having 
fewer than 5% attacker-free (entry, exit) options in each country. 


E3: Measuring the impact of sibling ASes. In this experi¬ 
ment we consider the possibility that ASes owned by the same 
organization (referred to as sibling ASes) may collude with 
each other in order to de-anonymize Tor users via asymmetric 
correlation attacks. We use data gathered by Anwar et al |TQ| 
to identify such ASes. The same setup as El was used. 

We observe from Figure [7] that the increase in threat from 
considering sibling ASes is marginal. Over the 10 countries, 
only 3% additional websites from our list of 200 for each coun¬ 
try had some request served by a circuit that was vulnerable to 
asymmetric attacks by sibling ASes. However, the increase in 
threat is not uniform. Clients in Brazil and Germany face an 8- 
10% increase in vulnerable websites. This can be attributed to 
the large telecom conglomerates operating within the countries 
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Fraction of attacker-free (entry, exit) pairs 
(a) Most vulnerable countries (all websites): BR, CN, IR, RU, US 



Fraction of attacker-free (entry, exit) pairs 
(b) Least vulnerable countries (all websites): DE, ES, ER, GB, IT 


BR — CN — DE — ES — FR —GB — IR — IT RU —US — 

Eig. 5: Distribution of the fraction of attacker-free circuits for 100 source ASes connecting to 200 websites in 10 different countries of interest. 
More skewed to the right indicates the availability of more safe circuits. 



Eig. 7: An estimate of the percentage of websites that have any 
requests served by a vulnerable Tor circuit when considering siblings. 


Main request Any request 


cu 



BR CN DE ES FR GB IR IT RU US All 


Eig. 8: An estimate of the percentage of websites that have main 
page requests or any requests served by a vulnerable Tor circuit when 
considering state-level adversaries. 


- e.g., many paths from our vantage points in Germany and 
Brazil were vulnerable to correlation attacks due to transiting 
one of the large number of ASes owned by Telefonica (in 
Spain) and Durand (in Brazil), respectively. 

E4: Measuring the impact of state-level adversaries. In this 
experiment we consider the threat that Tor clients face from 
state-level adversaries. We assume that a state-level adversary 
is able to gain insight into the traffic fiowing through all ASes 
operating within the state. Therefore, we consider a circuit 
originating from country X to be vulnerable if its path to/from 
its entry-relay and from/to the exit-relay to the destination 
contains some AS operating within X. The same setup as El 
was used for data collection. 

The results are broken down per country in Figure Here, 
we see that the situation is quite dire with 82% of all (over 
all 10 countries) websites having their main page served by 
a vulnerable circuit. In particular, clients in Brazil, China, 
France, Iran, and the United States face the biggest threat from 
state-level attacks with over 95% of their main page requests 
being vulnerable to state-level attackers. 

E5: Measuring the effect of guards. In this experiment we 
consider the effect of the number of guards on the vulnerability 
of Tor clients to network-level asymmetric correlation attacks. 
For each of our 10 countries, 100 ASes were randomly selected 


as client locations and the targets of all the requests generated 
by the 200 websites in our earlier experiments were used as 
the destinations. The simulation toolkit generated 60 unique 
guard-sets (20 each for 3 guards, 2 guards, and 1 guard) in 
an identical manner to the vanilla Tor client, and a list of all 
exit-relays available to each client for performing the page 
load (using Tor consensus data). Each (source, entry, exit, 
destination) combination was checked for the presence of our 
adversary. 



Fig. 9: Distribution of the fraction of attacker-free (entry, exit) pairs 
for vanilla Tor with 3, 2, and 1 guard(s). 


Figure illustrates the effect that reducing the size of the 
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guard-set has on the fraction of network-level attacker-free- 
paths available to the Tor client. 

While it is known that a smaller number of guards provides 
better security against relay-level attackers in the long-term 
we see from the results of this experiment that the effect 
is the opposite against network-level adversaries - i.e., as the 
size of the guard-set decreases, Tor is more likely to select 
a circuit vulnerable to network-level asymmetric correlation 
attacks due to the reduced number of available safe paths. 
In particular, when only 1 guard is used, over 15% of the 
(source, destination) pairs in our experiment had no safe- 
options, whereas the difference in security provided by two 
or three guards was marginal. This experiment demonstrates 
one of the conflicts between Tor clients geared for defending 
against relay-level attackers and those geared for defending 
against network-level attackers. 

IV. Astoria: An AS- and Capacity-aware Tor 
Client 

Motivated by the observation that vanilla Tor very often 
selects entry-exit pairs that may be subject to asymmetric cor¬ 
relation attacks, we seek to design a relay selection algorithm 
to mitigate the opportunities for such attackers. We design our 
relay selection system, Astoria, based on the idea of stochastic 
relay selection. This works by having the Tor client generate 
a probability distribution that minimizes the chance of attack 
over all possible entry- and exit- relay selection choices, and 
selecting an entry- and exit-relay based on this distribution. 
The advantage of stochastic selection is that even if the client 
has no safe options, relay-selection can be engineered to 
minimize the amount of information gained by the adversary 
over some period of time (as we show below). Further, it allows 
clients to select relays in a way such that no set of relays in 
the Tor eco-system is overloaded, even if every client uses the 
same relay-selection strategy. 

A. Astoria goals 

Astoria is constructed with several security and perfor¬ 
mance goals in mind: 

• Deal with asymmetric attackers. Astoria avoids con¬ 
structing circuits involving common ASes on the 
forward- or reverse-paths between the client to the 
entry-relay and the exit-relay and the destination. 

• Deal with the possibility of colluding attackers. Asto¬ 
ria considers the threat of ASes that may collude to 
de-anonymize Tor users. Astoria can be configured to 
build circuits that do not contain known to be collud¬ 
ing ASes on the forward- or reverse-path between the 
client and entry-relay and exit-relay and destination. 
This mitigates the threat from sibling ASes and state- 
level attackers. 

• Consider the worst case possibility. Astoria uses a 
probabilistic relay selection algorithm that ensures, 
even in the worst-case (where there are no safe paths 
to and from the entry- and exit-relay), that the ability 
of a single AS (or, family of ASes) to de-anonymize 
a large number of circuits is minimized. 




Uniform 

Optimal 

Entry 

AS1 

1/3 

1/4 

Entry 

AS2 

1/3 

1/4 

Entry 

AS3 

1/3 

1/2 


Fig. 10: Example of optimizing relay selection. Simplified to unidi¬ 
rectional paths and only entry-relay selection. 


• Minimize performance impact. It is clear that any AS- 
aware client will lose its ability to perform many op¬ 
timizations such as pre-constructing circuits. Our goal 
is to minimize the effect of the above considerations 
on the performance of the Tor client. 

• Be a good network citizen. Astoria takes into account 
the capacities of all relays available in the Tor eco¬ 
system and performs selection in a way that no single 
set of relays are overloaded, even when all clients in 
the network use the same relay-selection strategy. 

B. Minimizing information gained by the adversary 

While there often are cases when there is a relay selection 
that will completely eliminate the risk of our adversary, we 
develop our relay selection to be robust, even if this is not the 
case. Further, with attacks implemented using BGP hijacking 
and interception the number of unsafe paths may be higher 
than what we observe in our analysis (we discuss this more in 
Section |^. 

To minimize the risk of correlation attacks, we define a 
linear program which generates a probability for each relay 
selection with the objective to minimize the maximum proba¬ 
bility of a circuit encountering the attacker. Recall that in our 
adversary model, we consider a long-lived adversary and that 
minimizing the probability of an attacker may also be seen 
as minimizing the number of circuits the adversary is able 
to observe over a long period of time and numerous circuit 
construction cycles. 

Figure [T^ shows an example of relay selection to give 
intuition about how the LP minimizes the risk from the 
attacker. In this example, we consider unidirectional paths and 
only entry-relay selection for clarity. In the figure, if the source 
were to choose uniformly at random across the three entry- 
relays, there is a 2/3 chance that ASl will be able to observe 
traffic and only a 1/3 chance that AS2 will. In this case, the 
optimal selection is intuitive, that the source should choose 
entry-relays 1 and 2 with probability 1/4 each and entry-relay 
3 with probability 1/2. This lowers the probability that ASl 
can observe a circuit from 2/3 to 1/2. This probability of the 
most likely adversary is the quantity that our LP minimizes. 

We use the following notation: 

• Let ADVij be the set of attackers on the circuit using 
entry-relay i and exit-relay j to destination dest - i.e., 
VA G ADVi j : A G {^Psrc-^entryi C\ Pexitj-n-dest^• 
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• Let Xij^A be an indicator random variable for attacker 

A on the circuit using entry-relay i and exit-relay j - 
i.e., Xij^A = 1 A G ADVij, and 0 otherwise. 

• Let Pij be the probability that a client builds a circuit 
using entry-relay entryi and exit-relay exitj. 

The following linear program is used to minimize the 
probability of the most likely attacker (i.e., the number of 
circuits visible to the attacker). 

minimize 2 ; 

subject to 2 ; > TiPij Xij^A) yAeADVij 

i, j (2) 

Pij e [0, 1] , Vi, Vi ; ^ Pij = 1 

h j 

Essentially, given information about the presence of at¬ 
tackers (network-level or state-level) for each Psource^i and 
Pj^dest path, the linear program seeks to find the probability 
distribution (Pij) over available choices of entry- and exit- 
relays, for which the expected number of circuits visible to 
each attacker is minimized. Entry- and exit-relays are chosen 
according to this distribution (defined as Dip) during circuit 
construction. 

C. Security is not enough 

While our LP produces a relay selection distribution that 
minimizes the probability of success across all adversaries, 
it does not take into account the resources available at the 
selected relays. Given that Tor is a system run using commu¬ 
nity resources contributed by volunteers, load balancing users 
across these resources is important to ensure that they are 
used efficiently and no single relay or set of relays become 
overloaded. Eigure [T^ shows a snapshot of the distribution 
of relay capacities available during the period of this study, 
for all relays in the Tor system and the relays selected by a 
hypothetical perfect load-balancing Tor client - i.e., one where 
each relay serves exactly the amount of traffic that it can handle 
(assuming identically sized requests). Here, we see that over 
80% of all Tor traffic should be routed through « 35% of all 
the relays in the Tor network for every relay to be operating 
within its advertised capacity. 

In order to achieve load-balancing, we augment our relay- 
selection algorithm with information about relay capacities 
from the latest Tor consensus during circuit construction. This 
is done as follows: 

When there are safe entry and exit combinations: In this 
case, we select a safe combination according to the distribution 
of relay capacities. Eor example, given a set of safe entry- 
and exit- relay combinations E = {(eni, exi)... (en^, ex^)} 
and the distribution of their advertised capacities = {eni, 

..., erin, exi,..., exn}, we select a combination {eui^ exi) 
with probability Pi = 

This ensures that no single (entry- or exit-) relay is selected 
with probability higher than the ratio of its advertised capacity 
and the total advertised capacity of all safe (entry- or exit-) 
relays (just as is done by the vanilla Tor client). 


When there are no safe entry- and exit-relay combinations: 
In this case, in order to correctly minimize the amount of 
information gained by the adversary, we strictly obey the 
probability distribution output by our linear program described 
in the previous section. No attempt is made to balance loads 
according to relay capacities. It is important to note that this is 
a fairly infrequent case (as shown in experiment E2 in Section 

D. Implementing Astoria 

The measurement toolkit described in Section nil was 
integrated with a modified Tor client, as follows. 

Integrating our path measurement toolkit with the Tor 
client. Eor standard measurement purposes, the toolkit simply 
takes a source and destination address and returns the set of 
ASes on the forward and reverse-path between the two. 

However, in the context of integration with the Astoria 
client, it must predict paths to and from each of the entry- 
relays for the client’s AS, and paths from all exit-relays toward 
the destination AS (Eigure [^. This results in \En\ + \Ex\ + 2 
routing-tree computations where \En\ and \Ex\ are the number 
of entry and exit relays, respectively. In order to mitigate the 
risk of correlation attacks, by default. Tor restricts the number 
of entry-relays available to each Tor client to three (called 
guards ODx and there are typically of the order of 1,000 
exit-relays available to a client during circuit construction - 
resulting in the order of 1,000 routing-tree computations. 

Eortunately, since the source AS and entry-relay ASes are 
relatively stable, these paths can be precomputed for later use 
by the client. (We observe the benefit of this in Section [V| ) 
However, performing relay selection on a per-destination basis 
means that pre-building circuits, as is done by the current 
implementation of Tor, is no longer feasible. 

AS-aware on demand circuits. Eirst, the Tor client was 
modified to perform offline IP to ASN mapping using a 
database p8| for every incoming request. Note that since the 
entire database (9 MB) is downloaded, the client does not 
reveal its intended destination to any lookup services. 

Next, modifications were made to the way requests were al¬ 
located to circuits. The vanilla Tor client performs pre-emptive 
circuit construction in order to serve requests as they arrive 
(increasing performance significantly). This is unfortunately 
infeasible for a AS-aware client where relay-selection is a 
function of the destination. Although one may consider pre¬ 
constructing AS-aware circuits for a set of popular destination 
ASes, the performance benefit is marginal, at best. This is 
mainly due to the large number of third party requests for less 
popular destination ASes embedded in popular Web pages. As¬ 
toria, therefore, only performs on demand circuit construction. 
Eor each incoming request, Astoria first checks if there are 
existing circuits serving the same destination AS. The request 
is attached to the most suitable such circuit if it exists. 

Circuit construction. Astoria creates a new circuit if and 
only if a request arrives for a destination with no currently 
usable circuits. In such cases, the client and destination ASNs 
are passed to the circuit construction and relay selection 
algorithms. Circuit construction is performed as follows: 
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• First, a list of entry- and exit-relays meeting the 
requirements set by the request were obtained. If the 
Tor client is configured to utilize only guards as entry- 
relays, the list of guards is obtained. Next, in order 
to perform load-balancing, information from the most 
recent Tor consensus is obtained to generate the relay 
capacity distribution for each entry- and exit- 
relay combination. 

• The Astoria client performs lookups to the offline IP- 
ASN database to perform mapping between entry- 
and exit-relay IP address and AS numbers. These, 
along with the client and destination AS numbers are 
then passed to our AS-path prediction and attacker 
measurement toolkit (Section |nl|. 

• The toolkit returns the list of ASes on each forward- 
and reverse-path between the client and every potential 
entry-relay and the destination and every potential 
exit-relay. In order to improve performance, paths are 
cached for frequently queried destinations. Precompu¬ 
tation or caching of paths between the client and the 
high-uptime entry-relays and destinations and high- 
uptime exit-relays also help improve performance. 








The returned paths are checked for the presence of 
common ASes in the entry and exit AS path sets. 
If there are paths without an attacker, the linear 
program need not be invoked. Instead, Astoria selects 
a safe entry- and exit-relay combination according to 
the generated probability distribution (described 
in Section IV-C| ). We see the impact of this load¬ 
balancing technique in Section [V| 

If there are no attacker-free relay combinations, the 
linear program is invoked in order to select an entry- 
and exit-relay combination according to the distribu¬ 
tion Dip that minimizes the probab ility o f the most 
likely attacker (described in Section [TV B| ). 


Finally, once the entry- and exit-relays are selected 
according to one of the or Dip distributions, 
the circuit is constructed. The remainder of the cir¬ 
cuit construction process remains unchanged from the 
vanilla Tor client. 


V. Astoria Evaluation 



Fig. 11: CDF of page load times (including circuit creation times) 
for a uniform Tor, vanilla Tor, and Astoria client over 200 websites 
in all 10 countries. 


In order to understand the performance of Astoria and for 
comparison with the vanilla Tor client, three metrics were 
computed: (1) page-load time^ (2) distribution of selected 
relay bandwidths, and (3) overhead of path prediction. For each 
of these experiments we considered the same experimental 
settings as the vanilla Tor client in experiment El. Logs were 
recorded to extract advertised capacities of all available relays 
and all relays selected by the Astoria and vanilla Tor clients, 
and time required for AS path computation by the Astoria 
client. 

In order to assess the security of Astoria and for compari¬ 
son with the vanilla Tor client, experiments to measure security 
against network-level (experiment El), colluding network- 
level (experiment E3), and state-level (experiment E4) asym¬ 
metric correlation attackers were repeated using the Astoria 
client for page-loads in the same setting (including using the 
same guard-set in each country) as the vanilla Tor client (Sec¬ 
tion For each experiment, three statistics were computed: 
(1) the fraction of websites whose main page requests were 
served by vulnerable circuits, (2) the fraction of websites that 
any request that was served by a vulnerable circuit, and (3) 
the total fraction of vulnerable circuits. 


B. Performance evaluation 

In this section, we evaluate the performance of Astoria 
using three metrics: (1) page-load times, (2) distribution of 
selected relay bandwidths, and (3) overhead of path prediction. 


We evaluate Astoria along multiple axes. First, we consider 
the performance of Astoria by measuring the time required 
to load webpages and its ability to be a good Tor citizen 
by selecting bandwidth-rich relays. Second, we evaluate the 
security provided by Astoria. We show that Astoria constructed 
circuits are a good defense against the adversary described in 
Section]^ Finally, we evaluate the threat from attacks by relay- 
level adversaries. 

A. Evaluation methodology 

Similar to our experiments in Section [I^ we consider the 
performance and security of clients in 10 different countries - 
Brazil (BR), China (CN), Germany (DE), Spain (ES), Erance 
(ER), England (GB), Iran (IR), Italy (IT), Russia (RU), and the 
United States (US). The same 200 webpages as before were 
used for page-loads within each country. 


Page load times. Eigure shows the distribution of page¬ 
load times when using the vanilla Tor client, a modified Tor 
client with a uniform relay-selection strategy, and the Astoria 
client. We see that the median page-load time with the vanilla 
Tor client is only 5.9 sec, while the median page-load time 
for the Astoria and uniform Tor client are 8.3 sec and 15.6 
sec, respectively. Although this drop in performance from the 
vanilla Tor client to Astoria is significant, it can be argued there 
are two main causes for this, both of which are unavoidable 
to any AS-aware Tor client: (1) It is no longer possible to pre¬ 
construct and re-use circuits to the same degree as the vanilla 
Tor client, and (2) There is a non-negligible amount of time 
spent for computing paths and checking for the presence of 
attackers on these paths. 


^The Selenium driver, get () method was used to detect the end of 
page-loads. 
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Fig. 12: Distribution of bandwidths of relays selected by vanilla Tor, 
uniform Tor, Astoria, and the perfect load balancing client. 



Fig. 13: CDF of time spent on AS path computation per site. 


Load balancing. Astoria aims to balance load from clients 
across all relays in the Tor network so that no single set of 
relays are overloaded. Figure demonstrates the closeness 
of the load-balancing of the Astoria client with the vanilla Tor 
client and the perfect load-balancing client. We see that in 
spite of performing AS-aware relay-selection, Astoria is able 
to perform load-balancing at least as well as the vanilla Tor 
client, with neither of them achieving a perfect distribution. 

The results of this experiment allow us to confirm our 
hypothesis that the reduction in performance from the vanilla 
Tor client to Astoria is indeed because of our inability to 
preconstruct circuits and delays due to path computation, and 
not due to poor relay-selection. 

Overhead of path prediction. Figure shows the CDF of 

the total amount of time spent on computing AS paths, for each 
site. We see that for about 50% of all sites (200 sites in each of 
10 countries), the time spent on path computation is negligible. 
This is due to the high frequency of repeated occurrences of 
destination ASes in our 200 sites - resulting in the AS path for 
each exit-relay to that destination already being in the toolkit’s 
cache. In 60% of the cases where responses were not cached 
(and 86% of the cases, overall), computing AS paths required 
under 4 seconds. 

C. Security against network-level attackers 

In this section, Astoria is evaluated and compared with the 
vanilla Tor client by measuring its success in defending against 
various attackers performing asymmetric correlation attacks. A 
summary of all results are provided in Table 

El: Measuring vulnerability to network-level attacks. In 

this experiment, we compare the security provided by the 
Astoria client with the vanilla Tor client, against network-level 


adversaries. The threat from such adversaries is significantly 
reduced from up to 40% of all ci rcuits be ing v ulnerable to 
3%, with the Astoria client. Figures andbreaks down 
the results of this experiment by country. We see that Astoria 
completely removes the threat of network-level attackers on 
circuits carrying the main page request in clients from Brazil, 
France, and Iran, while bringing the risk down to under 5% in 
six other countries. 


E3: Measuring the impact of sibling ASes. We find 
that siblings have little impact on the security provided by 
Astoria. Over all circuits constructed by Astoria, the addition 
of colluding sibling ASes resulted in less than a 3% increase 
in number of vulnerable circuits, with the only significant 
increase being in Germany (DE). This is illustrated in Figures 
14c| and |14d| This large increase in number of vulnerable 


circuits indicates that if sibling ASes in Germany were to 
collude, Astoria (given the VPN client location and selected 
entry-guards) is often left with no safe entry- and exit- relay 
options for circuit construction. It is important to note that 
although there are a significant number of vulnerable circuits 
created by Astoria, these circuits are constructed using our 
linear program (Eq. which minimizes the number of circuits 
visible to each attacker. 


E4: Measuring the impact of state-level adversaries. As¬ 
toria performs reasonably well even against state-level adver¬ 
saries by reducing the fraction of potentially vulnerable circuits 
from 85% (vanilla Tor) to 25%, over all countries. The per 
country breakdown is illustrated in Figures |14e| and |14f| The 
results show a steep decrease in the ratio of vulnerable websites 
for all countries except the United States (US). This is due to 
the large presence of American ASes on paths to and from 
our US VPN vantage point and the entry-guards and any Tor 
exit-relay and our US destinations. 

Defending against active network-level attacks. Astoria 
focuses on adversaries who may lie on asymmetric network 
paths between the client and entry; and exit and destination, 
respectively. However, Sun et al highlight attacks based, 
not only on static path properties, but also dynamics of BGP 
{e.g., hijacks, routing instability). Taking this sort of attack 
into account is challenging as it requires realtime access to 
interdomain routing data and intelligent analysis to identify 
incidents that may impact the safety of the client’s path. In 
the future, we plan to integrate subscriptions to BGP hijack 
data sources {e.g., Argus p6| , or ongoing efforts at building 
a real-time interception detector |T^ ) into Astoria to allow it 
to operate on dynamic BGP paths. 


D. Security against relay-level attackers 

In order to defend against relay-level attackers, Astoria 
inherits the concept of entry-guards from the vanilla Tor client 
and also ensures that no two relays from the same family are 
placed on the same circuit. However, due to its AS-awareness, 
Astoria (and any AS-aware client that constructs circuits which 
are a function of the destination AS) currently is vulnerable 
to two relay-level attacks: (1) it is possible for a middle-relay 
in an Astoria constructed circuit to narrow down the set of 
possible (source, destination) AS pairs that are at either end 
of the circuit (based on the selected entry- and exit-relays), 
and (2) when Astoria is used from regions with no safe (entry. 
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Client 

Network-level (El) 

Colluding network-level (E3) 

State-level (E4) 

Websites 

(Main) 

Websites 

(Any) 

Circuits (All) 

Websites 

(Main) 

Websites 

(Any) 

Circuits (All) 

Websites 

(Main) 

Websites 

(Any) 

Circuits (All) 

Astoria 

3% 

8% 

2% 

6% 

13% 

5% 

27% 

34% 

25% 

Vanilla Tor 

37% 

53% 

40% 

40% 

56% 

42% 

82% 

88% 

85% 


TABLE III: Astoria vs. vanilla Tor: An estimate of the threat faced from various attackers. 


Vanilla Tor (any) Astoria (any) 



(a) Any request vs. Single AS adversaries [Experiment El] 


Vanilla Tor (main) Astoria (main) 



(b) Main request vs. Single AS adversaries [Experiment El] 


Vanilla Tor (any) I 


Astoria (any) I 


Vanilla Tor (main) I 


Astoria (main) I 


[IuIlilIiL I'IiLhhillLl 


BR CN DE ES FR GB 


IT RU US All 


BR CN DE ES FR GB IR IT RU US All 


(c) Any request vs. Sibling AS adversaries [Experiment E3] (d) Main request vs. Sibling AS adversaries [Experiment E3] 


Vanilla Tor (any) I 


Astoria (any) I 


Vanilla Tor (main) I 


Astoria (main) i 




(e) Any request vs. State-level adversaries [Experiment E4] (f) Main request vs. State-level adversaries [Experiment E4] 

Eig. 14: Astoria vs. vanilla Tor: Percentage of websites using vulnerable circuits for their main request or any request, against various adversaries. 


exit) relay options, it is possible for a relay-level attacker to 
force Astoria to create circuits that can be de-anonymized by 
it. Below, we discuss these attacks, their impact, and how to 
mitigate them. 


Measuring the threat posed by middle-relays. As seen in 
Table HI in a majority of all cases, Astoria is able to find 
a safe pair of entry- and exit-relays to use for its circuits. 
As a result, an adversarial middle-relay working under the 
assumption that Astoria always constructs safe circuits, will 
be able to narrow down the set of possible source- and 
destination-ASes by simply observing the entry- and exit- 
relays in the circuit. Below, using the results of experiment 
E2 and statistical inference techniques, we show that the threat 
from such adversarial relays is negligible. 


First, given our random sample of 100 source ASes for 
each country (and fixed set of destinations) we infer the 


mean number of (source, destination) pairs with greater than 
50% safe entry- and exit-relay pair options for the entire 
population of source ASes in each country (with the same 
fixed destinations). Then, we find a lower-bound estimate on 
the expected number of (source, destination) AS pairs that have 
each (entry, exit) pair as a safe option - i.e., a lower-bound on 
the number of (source, destination) pairs that can be linked to 
the circuit by a middle-relay in a single observation. Finally, 
we show that given the current distribution of Tor relays, the 
probability of narrowing down this set of sources to a single 
(source, destination) pair is negligible. 

Inferring the mean number of (source, destination) pairs 
with greater than 50% safe options. Recall that in experiment 
E2, 100 source ASes were selected at random from the set 
of all ASes in each country. The experiment considers the 
destination ASes generated by the loading of 200 non-random 
destinations. Let the set of sampled source ASes be denoted 
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by X and the set of destination ASes be denoted by D. From 
the results of the experiment, we extract the mean fraction of 
(x e X, d e D) pairs which have more than 50% safe entry- 
and exit-relay options (denoted by jJix.D)- Let X denote the 
set of all ASes within each country. Now, using the central 
limit theorem and the sampling distribution of the sample 
means (341 , we infer the 99% confidence-interval for the mean 
fraction of {x ^ X, d ^ D) pairs which have more than 50% 
safe entry- and exit-relay options (denoted by jJ^x,D)- 


Estimating a lower-bound on linkable sources. We take an 
extremely conservative approach to derive this lower-bound. 
First, we use the lower value of fix.D from our 99% confidence 
interval. Further, we assume that iix^D fraction of our (x e X, 
d G D) pairs have only exactly 50% safe entry- and exit- 
relay options (although fix.D denotes the fraction of (x G X, 
d e D) pairs with greater than 50% safe options). Finally, 
we assume that the remaining 1 — fraction of (x e X, 
d e D) pairs have no safe options. Given these assumptions, 
we can compute the lower-bound on the expected number 
of (x G X, d G D) pairs which have each (entry, exit) 
pair as a safe option (denoted by E[Sen ex]) as: E[Sen ex] 

= = -50 X X |X| X ]d\. 


E[Sen,ex] IS a lowcr-bound on the expected number of 
linkable source and destination pairs for each observation of 
an entry- and exit-relay (under the conservative assumption 
that an adversarial middle-relay knows the country in which 
the client is located and the set of all possible destinations D 
that any client may connect to). 


Estimating the probability of complete de-anonymization. 
Given that E[Sen,ex] is the number of (x G X, d G D) pairs 
that are linkable to a single observation of an (entry, exit) 
pair and assuming a constant rate of reduction in linkable 


pairs (given by 


E[Se^ 


0, the number of circuits that need 


\X\x\D\ 

to be observed by the adversarial middle-relay to narrow 
down the number of (x G X, d G D) pairs to 1 - i.e., to 
completely de-anonymize the source and destination - is n = 

— log(|X| X |-D|) _ (since ( E[Sen ,ex\ \n _ 1 


\0g{E[Ser.,e.])-l0g{\X\x\D\) 


\X\x\D\ 


r = 


\X\x\D\ 


). 


Since Astoria (1) constructs new circuits only if there are 
no existing circuits that serve the same destination AS, and 
(2) selects middle-relays for each new circuit according the 
the bandwidth distribution of relays, we obtain the expected 
upper-bound of the probability of a middle-relay being able 
to observe n circuits between the same source and destination 
ASes (with different entry- and exit-relays). Table shows 
that this probability (denoted by P^) is negligible even for the 
Tor relay with the current highest advertised bandwidth where 
the probability of selection as the middle-relay is .007. 


Defending against attacks due to predictable relay-selection 
when there are no safe options. In certain client locations 
(e.g., some ASes in China and Iran), there are no safe entry- 
and exit-relay selections for some destinations, regardless of 
the guards used by the client. In these cases, a relay-level 
adversary may place entry-and exit-relays in ASes that provide 
a safe-path for Astoria clients attempting to connect to specific 
target destinations. This manipulates Astoria into using the 
adversarial (entry, exit) pair on all circuits connecting the client 
to the target destination - allowing trivial de-anonymization of 
the user. 



XI 

\D\ 


99%CI 

E [5] 

n 


BR 

3,515 

165 

.40 

(.39, .41) 

114,797 

8.1 

5.7 

xl0“^^ 

CN 

1,227 

131 

.44 

(.43, .46) 

35,216 

7.8 

8.2 

xl0“^® 

DE 

2,022 

190 

.33 

(.33, .34) 

63,409 

7.1 

8.2 

xl0“^® 

ES 

703 

181 

.40 

(.39, .41) 

25,295 

7.2 

8.2 

xl0“^® 

ER 

1,251 

187 

.32 

(.31, .33) 

36,448 

6.6 

1.1 

xl0“^^ 

GB 

2,372 

187 

.35 

(.34, .36) 

76,473 

7.3 

8.2 

xl0“^® 

IR 

470 

133 

.39 

(.38, .40) 

11,878 

6.6 

1.1 

xl0“^^ 

IT 

932 

201 

.29 

(.28, .30) 

26,800 

6.2 

1.1 

xl0“^^ 

RU 

5,868 

178 

.27 

(.26, .28) 

140,201 

6.9 

1.1 

xl0“^^ 

us 

23,588 

188 

.45 

(.44, .46) 

977,768 

10.1 

2.8 

xl0“2^ 


TABLE IV: Results from statistical analysis of the expected upper- 
bound of the threat posed by adversarial middle-relays on Astoria 
(using data obtained from our simulation experiment (E2). 


Astoria can defend against such attacks by selecting from 
safe (entry, exit) pairs only when a minimum threshold of 
available safe (entry, exit) pairs is met. In cases where the 
threshold is not met, Astoria may discard the few remaining 
safe pairs and choose entry- and exit-relays according to the 
distribution produced by its linear program (Eq. [^, which 
minimizes the amount of information gained by the network- 
level adversary. This however, enables correlation attacks by 
selected network-level attackers. Since it is not yet clear if 
network-level adversaries pose a larger threat than relay-level 
adversaries. Therefore, determining this threshold is a non¬ 
trivial open research problem. 

VI. Discussion 

In this section, we compare the Astoria Tor client with 
the hypothetical perfect Tor client and discuss how Astoria 
can be augmented and improved with recent and ongoing 
developments from the network measurement community. 

A. Comparing Astoria and the perfect Tor client 

Here we point out some of the shortcomings of Astoria 
when compared to the perfect Tor client. We find that many 
of these apply to any AS-aware client. The perfect Tor client 
is able to simultaneously achieve three conflicting goals: 

Defend against network-level attackers. The perfect Tor 
client is able to prevent compromise from network-level at¬ 
tackers. In particular, the client constructs circuits that are safe 
from traffic correlation attacks. 

While such adversaries are largely ignored by the vanilla 
Tor client, Astoria successfully deals with them by utilizing 
efficient path-prediction tools to explicitly avoid relays that 
enable correlation attacks. However, Astoria does not currently 
deal with attacks from active network-level adversaries that are 
able to exploit BGP dynamics. In addition, Astoria is unable to 
exactly predict the paths that will be utilized to communicate 
with each Tor relay, and therefore only makes estimates (which 
are validated to be reasonably tight estimates). 


13 

























Defend against relay-level attackers. Since the Tor network 
is volunteer driven, it is critical for the perfect Tor client to 
be able to defend against passive and active attackers that are 
able to control a fraction of all relays within the network. 
This primarily involves (1) constructing circuits so that the 
probability of an adversarial pair of relays occupying the entry- 
and exit-hop of the circuit is low, and (2) ensuring that no 
single relay should be able to conclusively link the source and 
destination of the circuits it is on. 

While the vanilla Tor client is able to successfully mitigate 
threats from many types of relay-level attacks, we find that 
this is challenging for AS-aware clients such as Astoria. First, 
while the concept of entry-guards mitigates many threats from 
relay-level attackers, it has a negative infiuence on the number 
of safe circuits that can be built by AS-aware clients. Second, 
AS-aware circuits inherently leak some information about the 
source and destination of the circuit. Our analysis in Section 
shows that in the average-case, Astoria circuits are safe 
from de-anonymization due to these leaks. 

Maintain performance and load-balancing. The perfect Tor 
client must also perform load-balancing to ensure that no single 
set of relays in the network are overloaded, while providing 
reasonable performance for all its users. 

In Section |V| we demonstrated that Astoria performs load¬ 
balancing in an identical manner to the vanilla Tor client and 
page-loads are only slightly slower in most cases. There are 
two main reasons for Astoria’s increased page-load times: (1) 
Path prediction is expensive, and (2) Astoria loses the ability 
to pre-emptively construct circuits. While (1) is unavoidable, 
there are interesting future research questions regarding (2) 
- e.g., can smart caching and pre-emptive/predictive circuit 
construction for a set of popular/predicted destinations result 
in significant performance gains? 

B. Improving path-prediction accuracy 

Measuring the potential threat of correlation attacks is 
made challenging by the fact that it requires measuring both 
forward and reverse network paths between the client and 
entry, and exit and destination, respectively. Thus, we opt 
to leverage an up-to-date map of the Internet’s topology, 
augmented with inferred business relationships between net¬ 
works and a model of routing policies to infer network paths. 
Modeling of interdomain routing is a thorny issue and we 
take care to avoid well known pitfalls including complex 
business relationships (e.g., ASes that act as a customer in one 
geographic region, and a peer in others) and sibling ASes (ie., 
multiple ASes which correspond to a single organization). The 
issue of siblings ASes is particularly relevant in our context, 
as multiple ASes controlled by a single organization may 
share information to perform a correlation attack. Despite all 
this, accurate path prediction remains an open challenge. In a 
related study, we validate the accuracy of this approach and 
find that measured paths follow this model 65-85% of the time 
| [T0l . As a result, the numbers we observe should be taken as 
an estimate of the threat. 

We note that novel path measurement tools are on the 
horizon (e.g., Sibyl p7| ) that take into account richer vantage 
point sets than prior work (e.g., PlanetLab used by iPlane 
vs. RIPE Atlas | [35| used by Sibyl). An interesting future 


direction is determining how such measurement planes can 
be integrated into a Tor client (e.g., to operate in an offline 
mode or via a secured querying interface). 

VII. Conclusions 

We have leveraged highly-optimized algorithmic simula¬ 
tions of interdomain routing on empirically-derived AS-level 
topologies to quantify the potential for correlation attacks 
where an adversary can leverage asymmetric Internet routing 
and collude with others within the same organization. Our 
results show that a significant number of Tor circuits are 
vulnerable to AS- and state-level attackers. 

To mitigate the threat from such attackers, we developed 
Astoria—an AS-aware Tor client. Beyond providing a high- 
level of security against these attacks, Astoria also has perfor¬ 
mance that is within a reasonable distance from the current Tor 
client. Also, unlike other AS-aware Tor clients, Astoria also 
considers how circuits should be built in the worst case, i.e., 
when there are no safe relays available to the client. Further, 
Astoria is a good network citizen and is designed to ensure 
that the all circuits created by it are load-balanced across the 
volunteer-driven Tor network. 

Our work highlights the importance of applying current 
models and data from network measurements to inform relay 
selection so as to protect against timing attacks. Astoria also 
opens multiple avenues for future work such as integrating 
real-time hijack and interception detection systems (to fully 
counter RAPTOR attacks) and understanding how new 
measurement services can be leveraged by a Tor client without 
defeating anonymity. 

Source code: The source code of the Astoria client is available 
under the CRAPL p] license at http://nrg.cs.stonybrook.edu/ 
astoria- as- aware-relay- selection-for-tor/ 
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